Causal Ladder to Heaven - different aspect to analyze the data

Welcome back to our series of introductions to causal inference. In this second article, we will climb the causal ladder, a powerful framework proposed by Judea Pearl to understand different types of causal questions and how to answer them. The target reader is high-school educated people, and the tone of this article is interesting.

What is the causal ladder?

The causal ladder is a hierarchy of three levels of problems that involve causality:

Association: This is the lowest level, where we only observe data and look for patterns or correlations. For example, we might see that people who smoke tend to have lung cancer more often than people who don't smoke. This does not mean that smoking causes lung cancer, it just means that there is an association between the two variables.
Intervention: This is the middle level, where we change something in the data and see what happens. For example, we might randomly assign some people to smoke and some people to not smoke, and then measure their lung cancer rates. This way, we can estimate the causal effect of smoking on lung cancer, which is different from the association we saw before.
Counterfactuals: This is the highest level, where we imagine what would have happened if something was different. For example, we might ask: what if I had not smoked for the past 10 years? Would I still have lung cancer? This is a counterfactual question, because it involves going back in time and changing something that already happened.

The causal ladder helps us understand that not all causal questions are the same, and that some are harder to answer than others. The higher we go on the ladder, the more information and assumptions we need to answer the questions.

Why is association different from causality?

Association is a statistical concept that measures how two variables are related in the data. For example, we can calculate the correlation coefficient between smoking and lung cancer, which tells us how much they vary together. However, association does not imply causality, because there might be other factors that affect both variables. For example, there might be a genetic factor that makes some people more likely to smoke and also more likely to develop lung cancer. In this case, smoking and lung cancer are associated, but not causally related.

Causality is a deeper concept that tells us how one variable affects another variable in reality. For example, we can say that smoking causes lung cancer if we can show that smoking increases the probability of getting lung cancer, regardless of any other factors. To establish causality, we need to do more than just observe data; we need to intervene or manipulate the data in some way.

Why is it hard to find causality from observational data?

Observational data are data that we collect without interfering with the system or process that generates them. For example, we can observe how many people smoke and how many people have lung cancer in a population. Observational data are useful for finding associations, but not for finding causality. This is because observational data are often confounded by other factors that affect both the cause and the effect variables. For example, as we mentioned before, there might be a genetic factor that confounds the relationship between smoking and lung cancer.

Conclusion

In this article, we learned about the causal ladder, a framework that helps us understand different types of causal questions and how to answer them. We also learned why association is different from causality, and why it is hard to find causality from observational data. Stay tuned and happy climbing!